Auto-detect audio format in OpenAISpeechToTextClient by jozkee · Pull Request #7575 · dotnet/extensions

jozkee · 2026-06-16T21:46:30Z

When the audio stream is not a FileStream, the client now peeks at the leading bytes to detect the format (wav, webm, m4a, mp3) and sets the multipart filename accordingly. This fixes HTTP 400 errors when sending non-MP3 audio (e.g. WAV) in a MemoryStream, since the OpenAI API uses the file extension to determine the audio format.

Add DetectAudioExtension using Span.SequenceEqual for readability
Add integration tests for all OpenAI-supported formats (mp3, wav, m4a, webm)
Add unit tests covering each magic-byte detection branch
Add ExpectedAudioFilename assertion to VerbatimMultiPartHttpHandler

Fixes #7543

Microsoft Reviewers: Open in CodeFlow

…#7543) When the audio stream is not a FileStream, the client now peeks at the leading bytes to detect the format (wav, webm, m4a, mp3) and sets the multipart filename accordingly. This fixes HTTP 400 errors when sending non-MP3 audio (e.g. WAV) in a MemoryStream, since the OpenAI API uses the file extension to determine the audio format. - Add DetectAudioExtension using Span.SequenceEqual for readability - Add integration tests for all OpenAI-supported formats (mp3, wav, m4a, webm) - Add unit tests covering each magic-byte detection branch - Add ExpectedAudioFilename assertion to VerbatimMultiPartHttpHandler Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

This PR updates OpenAISpeechToTextClient to auto-detect audio format (wav/webm/m4a/mp3) from leading “magic bytes” when the provided audio stream is not a FileStream, and uses the detected extension in the multipart filename so OpenAI can correctly infer the format (fixing 400s for non-MP3 MemoryStream inputs).

Changes:

Add stream-header “magic byte” detection and filename resolution logic in OpenAISpeechToTextClient.
Add unit tests validating filename selection for each supported format and branch.
Add integration coverage for multiple embedded audio formats and enhance multipart handler assertions to validate the uploaded filename.

Reviewed changes

Copilot reviewed 5 out of 9 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
src/Libraries/Microsoft.Extensions.AI.OpenAI/OpenAISpeechToTextClient.cs	Adds filename resolution with magic-byte detection for non-`FileStream` inputs.
test/Libraries/Microsoft.Extensions.AI.OpenAI.Tests/OpenAISpeechToTextClientTests.cs	Adds theory-based unit tests asserting detected multipart filenames for different headers.
test/Libraries/Microsoft.Extensions.AI.Integration.Tests/VerbatimMultiPartHttpHandler.cs	Adds optional filename assertion for multipart “file” fields.
test/Libraries/Microsoft.Extensions.AI.Integration.Tests/SpeechToTextClientIntegrationTests.cs	Adds integration test that exercises auto-detection across multiple audio formats.
test/Libraries/Microsoft.Extensions.AI.Integration.Tests/Microsoft.Extensions.AI.Integration.Tests.csproj	Embeds additional audio resource files used by the new integration test.

Comments suppressed due to low confidence (1)

src/Libraries/Microsoft.Extensions.AI.OpenAI/OpenAISpeechToTextClient.cs:121

In GetStreamingTextAsync, ResolveFilename(audioSpeechStream) is executed unconditionally even for translation requests, but the translation branch immediately delegates to GetTextAsync(...) (which resolves the filename again). With the new magic-byte peek, this results in redundant header reads/rewinds for translation streaming.

        _ = Throw.IfNull(audioSpeechStream);

        string filename = ResolveFilename(audioSpeechStream);

        if (IsTranslationRequest(options))
        {
            foreach (var update in (await GetTextAsync(audioSpeechStream, options, cancellationToken).ConfigureAwait(false)).ToSpeechToTextResponseUpdates())

jozkee · 2026-06-16T21:55:41Z

+    }
+
+    /// <summary>Detects the audio format extension from the leading bytes of the audio data.</summary>
+    private static string DetectAudioExtension(ReadOnlySpan<byte> header)


For reference, OpenAI supported formats are: mp3, mp4, mpeg, mpga, m4a, wav, and webm. And quotes from the specs related to the matching occurring in this method:

WAV — RIFF at offset 0, WAVE at offset 8
Source: Microsoft Multimedia Programming Interface and Data Specifications 1.0 (August 1991), referenced from:
https://www.mmsp.ece.mcgill.ca/Documents/AudioFormats/WAVE/WAVE.html

Field Length Contents
ckID 4 Chunk ID: "RIFF"
cksize 4 Chunk size: 4+n
WAVEID 4 WAVE ID: "WAVE"

And later under Examples, the full structure shows bytes 0–3 = RIFF, bytes 4–7 = size, and the WAVEID field at bytes 8–11 is WAVE.

MP3 / MPEG / MPGA — ID3 at offset 0, or frame sync 0xFF 0xE_
Source: http://www.mp3-tech.org/programmer/frame_header.html (authoritative MP3 technical reference, derived from ISO/IEC 11172-3)

Verified citation (exact text):

The first twelve bits (or first eleven bits in the case of the MPEG 2.5 extension) of a frame header are always set to 1 and are called "frame sync".

And the header table shows:

Sign Length (bits) Position (bits) Description
A 11 (31-21) Frame sync (all bits must be set)
11 bits set = bytes 0xFF + top 3 bits of next byte set = (header[1] & 0xE0) == 0xE0

For ID3v2 tags preceding MP3 data:
Source: https://id3.org/id3v2.3.0 — Section 3.1 "ID3v2 header"

"The first three bytes of the tag are always "ID3" to indicate that this is an ID3v2 tag"

MP4 / M4A — ftyp at offset 4
Source: W3C Note "ISO BMFF Byte Stream Format" (referencing ISO/IEC 14496-12 "ISO Base Media File Format"):
https://www.w3.org/TR/mse-byte-stream-format-isobmff/

Verified citation (exact text):

An ISO BMFF initialization segment is defined in this specification as a single File Type Box (ftyp) followed by a single Movie Box (moov).

Per ISO 14496-12 box format: bytes 0–3 = box size (uint32 big-endian), bytes 4–7 = box type (FourCC). The first box MUST be ftyp.

WebM — 0x1A 0x45 0xDF 0xA3 at offset 0
Source: RFC 8794 — "Extensible Binary Meta Language" (IETF Standards Track), Section 8.1 "EBML Header":
https://www.rfc-editor.org/rfc/rfc8794.txt

Verified citation (exact text from Section 8.1):

The EBML Header MUST contain a single Master Element with an Element Name of "EBML" and Element ID of "0x1A45DFA3" (see Section 11.2.1)

WebM is a profile of Matroska (RFC 9559), which is an EBML Document Type. Every WebM file begins with the EBML Header whose first element has ID 0x1A45DFA3.

dotnet-comment-bot · 2026-06-16T22:57:18Z

‼️ Found issues ‼️

Project	Coverage Type	Expected	Actual
Microsoft.Extensions.Diagnostics.Testing	Line	99	98.65 🔻
Microsoft.Extensions.Telemetry	Line	93	91.95 🔻
Microsoft.Extensions.AI	Line	89	88.53 🔻
Microsoft.Extensions.AI	Branch	89	88.57 🔻
Microsoft.Extensions.AI.OpenAI	Line	75	62.86 🔻
Microsoft.Extensions.AI.OpenAI	Branch	75	50.31 🔻
Microsoft.Extensions.DataIngestion.MarkItDown	Line	75	4.46 🔻
Microsoft.Extensions.DataIngestion.MarkItDown	Branch	75	0 🔻
Microsoft.Extensions.Diagnostics.ResourceMonitoring	Line	99	96.03 🔻
Microsoft.Extensions.Diagnostics.ResourceMonitoring	Branch	99	94.39 🔻
Microsoft.Extensions.Diagnostics.ResourceMonitoring.Kubernetes	Line	99	97.73 🔻
Microsoft.Extensions.ServiceDiscovery.Dns	Line	75	69.93 🔻
Microsoft.Extensions.ServiceDiscovery.Abstractions	Line	75	42.11 🔻
Microsoft.Extensions.ServiceDiscovery.Abstractions	Branch	75	42.86 🔻
Microsoft.Extensions.ServiceDiscovery	Line	75	67.81 🔻
Microsoft.Extensions.ServiceDiscovery	Branch	75	71.43 🔻
Microsoft.Extensions.ServiceDiscovery.Yarp	Line	75	73.85 🔻
Microsoft.Extensions.ServiceDiscovery.Yarp	Branch	75	70 🔻
Microsoft.Extensions.VectorData.Abstractions	Line	75	37.39 🔻
Microsoft.Extensions.VectorData.Abstractions	Branch	75	22.73 🔻

🎉 Good job! The coverage increased 🎉
Update MinCodeCoverage in the project files.

Project	Expected	Actual
Microsoft.Gen.BuildMetadata	97	100
Microsoft.Gen.MetadataExtractor	57	73
Microsoft.Gen.MetricsReports	67	69
Microsoft.Extensions.AI.Abstractions	82	85
Microsoft.Extensions.AI.Evaluation.NLP	0	78
Microsoft.Extensions.Caching.Hybrid	82	89
Microsoft.Extensions.DataIngestion	75	89
Microsoft.Extensions.DataIngestion.Markdig	75	90
Microsoft.Extensions.Http.Resilience	97	100

Full code coverage report: https://dev.azure.com/dnceng-public/public/_build/results?buildId=1467244&view=codecoverage-tab

tarekgh · 2026-06-17T22:12:33Z

+            int bytesRead = 0;
+            while (bytesRead < header.Length)
+            {
+                int n = audioSpeechStream.Read(header, bytesRead, header.Length - bytesRead);


Are we sure the stream is positioned at the beginning to ensure we are reading the header?

jeffhandley · 2026-06-17T22:06:29Z

+            int bytesRead = 0;
+            while (bytesRead < header.Length)
+            {
+                int n = audioSpeechStream.Read(header, bytesRead, header.Length - bytesRead);


Can you use Stream.ReadExactly here, or does that prevent you from being able to reliably rewind after reading in an unsuccessful read-exactly scenario?

Maybe Stream.ReadAtLeast might be appropriate though, with throwOnEndOfStream set to false?

But then again, maybe just having this loop here is the cleanest option, as you're not working against what those convenience APIs are trying to do.

tarekgh · 2026-06-17T22:14:04Z

+            }
+
+            audioSpeechStream.Position -= bytesRead;
+            return $"audio.{DetectAudioExtension(header.AsSpan(0, bytesRead))}";


what happen if we get unrecognized format?

jozkee requested a review from rogerbarreto June 16, 2026 21:46

jozkee self-assigned this Jun 16, 2026

jozkee requested a review from a team as a code owner June 16, 2026 21:46

Copilot AI review requested due to automatic review settings June 16, 2026 21:46

jozkee added the area-ai Microsoft.Extensions.AI libraries label Jun 16, 2026

Copilot started reviewing on behalf of jozkee June 16, 2026 21:47 View session

Copilot AI reviewed Jun 16, 2026

View reviewed changes

Comment thread test/Libraries/Microsoft.Extensions.AI.Integration.Tests/VerbatimMultiPartHttpHandler.cs

jozkee commented Jun 16, 2026

View reviewed changes

tarekgh reviewed Jun 17, 2026

View reviewed changes

jeffhandley approved these changes Jun 17, 2026

View reviewed changes

tarekgh reviewed Jun 17, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Auto-detect audio format in OpenAISpeechToTextClient#7575

Auto-detect audio format in OpenAISpeechToTextClient#7575
jozkee wants to merge 1 commit into
mainfrom
issue-7543

jozkee commented Jun 16, 2026 •

edited by dotnet-policy-service Bot

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

jozkee Jun 16, 2026 •

edited

Loading

Uh oh!

dotnet-comment-bot commented Jun 16, 2026

Uh oh!

tarekgh Jun 17, 2026

Uh oh!

jeffhandley Jun 17, 2026

Uh oh!

tarekgh Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

jozkee commented Jun 16, 2026 • edited by dotnet-policy-service Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Microsoft Reviewers: Open in CodeFlow

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

jozkee Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dotnet-comment-bot commented Jun 16, 2026

Uh oh!

tarekgh Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

jeffhandley Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

tarekgh Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

jozkee commented Jun 16, 2026 •

edited by dotnet-policy-service Bot

Loading

jozkee Jun 16, 2026 •

edited

Loading